相对属性(RA),参考在特定属性的强度上的两个图像上的偏好,可以使由于其丰富的语义信息来实现良好的图像到图像转换。然而,基于RAS的现有工作未能调和细粒度翻译的目标以及高质量一代的目标。我们提出了一个新的模型之旅,以协调这两个目标,以获得高质量的细粒度翻译。特别是,我们同时培训了两个模块:一个发电机,它将输入图像转换为所需图像,具有相对于感兴趣的属性的平滑微妙变化;和排名由输入图像和所需图像组成的竞争偏好的排名。竞争对手的偏好是指对抗性排名过程:(1)排名师在所需属性方面认为所需图像和输入图像之间没有差异; (2)发电机欺骗排名师以相信所需图像根据需要在输入图像上改变属性。介绍了RAS成对的真实图像,以指导排名仪对仅对感兴趣的属性进行排名对。通过有效的排名,发电机将通过产生与输入图像相比,通过产生所需改变的高质量图像来“赢得”对抗游戏。两个面部图像数据集和一个鞋图像数据集的实验表明,我们的旅行实现了最先进的导致生成高保真图像,这表现出对感兴趣的属性的平滑变化。
translated by 谷歌翻译
本文提出了差异性批判性生成对抗网络(DICGAN),以了解只有部分而不是整个数据集具有所需属性时用户呈现数据的分布。 Dicgan生成了满足用户期望的所需数据,并可以协助设计具有所需特性的生物产品。现有方法首先选择所需的样品,然后在选定样品上训练常规甘斯以得出用户呈现的数据分布。但是,所需数据的选择取决于整个数据集的全球知识和监督。 Dicgan介绍了一个差异评论家,该评论家从成对的偏好中学习,这些偏好是本地知识,可以在培训数据的一部分中定义。批评家是通过定义与瓦斯坦斯坦·甘(Wasserstein Gan)批评家的额外排名损失来建立的。它赋予每对样本之间的评论值差异,并具有用户喜好,并指导所需数据的生成而不是整个数据。为了获得更有效的解决方案以确保数据质量,我们将Dicgan进一步重新重新将其作为约束优化问题,基于理论上证明了我们的Dicgan的收敛性。对具有各种应用程序的各种数据集进行的广泛实验表明,我们的Dicgan在学习用户呈现的数据分布方面取得了最新的性能,尤其是在不足的所需数据和有限的监督下。
translated by 谷歌翻译
Knowledge graph embedding (KGE), which maps entities and relations in a knowledge graph into continuous vector spaces, has achieved great success in predicting missing links in knowledge graphs. However, knowledge graphs often contain incomplete triples that are difficult to inductively infer by KGEs. To address this challenge, we resort to analogical inference and propose a novel and general self-supervised framework AnKGE to enhance KGE models with analogical inference capability. We propose an analogical object retriever that retrieves appropriate analogical objects from entity-level, relation-level, and triple-level. And in AnKGE, we train an analogy function for each level of analogical inference with the original element embedding from a well-trained KGE model as input, which outputs the analogical object embedding. In order to combine inductive inference capability from the original KGE model and analogical inference capability enhanced by AnKGE, we interpolate the analogy score with the base model score and introduce the adaptive weights in the score function for prediction. Through extensive experiments on FB15k-237 and WN18RR datasets, we show that AnKGE achieves competitive results on link prediction task and well performs analogical inference.
translated by 谷歌翻译
Temporal sentence grounding (TSG) aims to identify the temporal boundary of a specific segment from an untrimmed video by a sentence query. All existing works first utilize a sparse sampling strategy to extract a fixed number of video frames and then conduct multi-modal interactions with query sentence for reasoning. However, we argue that these methods have overlooked two indispensable issues: 1) Boundary-bias: The annotated target segment generally refers to two specific frames as corresponding start and end timestamps. The video downsampling process may lose these two frames and take the adjacent irrelevant frames as new boundaries. 2) Reasoning-bias: Such incorrect new boundary frames also lead to the reasoning bias during frame-query interaction, reducing the generalization ability of model. To alleviate above limitations, in this paper, we propose a novel Siamese Sampling and Reasoning Network (SSRN) for TSG, which introduces a siamese sampling mechanism to generate additional contextual frames to enrich and refine the new boundaries. Specifically, a reasoning strategy is developed to learn the inter-relationship among these frames and generate soft labels on boundaries for more accurate frame-query reasoning. Such mechanism is also able to supplement the absent consecutive visual semantics to the sampled sparse frames for fine-grained activity understanding. Extensive experiments demonstrate the effectiveness of SSRN on three challenging datasets.
translated by 谷歌翻译
Normalizing flow is a class of deep generative models for efficient sampling and density estimation. In practice, the flow often appears as a chain of invertible neural network blocks; to facilitate training, existing works have regularized flow trajectories and designed special network architectures. The current paper develops a neural ODE flow network inspired by the Jordan-Kinderleherer-Otto (JKO) scheme, which allows efficient block-wise training of the residual blocks and avoids inner loops of score matching or variational learning. As the JKO scheme unfolds the dynamic of gradient flow, the proposed model naturally stacks residual network blocks one-by-one, reducing the memory load and difficulty of performing end-to-end training of deep flow networks. We also develop adaptive time reparameterization of the flow network with a progressive refinement of the trajectory in probability space, which improves the model training efficiency and accuracy in practice. Using numerical experiments with synthetic and real data, we show that the proposed JKO-iFlow model achieves similar or better performance in generating new samples compared with existing flow and diffusion models at a significantly reduced computational and memory cost.
translated by 谷歌翻译
Score-based diffusion models have captured widespread attention and funded fast progress of recent vision generative tasks. In this paper, we focus on diffusion model backbone which has been much neglected before. We systematically explore vision Transformers as diffusion learners for various generative tasks. With our improvements the performance of vanilla ViT-based backbone (IU-ViT) is boosted to be on par with traditional U-Net-based methods. We further provide a hypothesis on the implication of disentangling the generative backbone as an encoder-decoder structure and show proof-of-concept experiments verifying the effectiveness of a stronger encoder for generative tasks with ASymmetriC ENcoder Decoder (ASCEND). Our improvements achieve competitive results on CIFAR-10, CelebA, LSUN, CUB Bird and large-resolution text-to-image tasks. To the best of our knowledge, we are the first to successfully train a single diffusion model on text-to-image task beyond 64x64 resolution. We hope this will motivate people to rethink the modeling choices and the training pipelines for diffusion-based generative models.
translated by 谷歌翻译
This paper studies the distribution estimation of contaminated data by the MoM-GAN method, which combines generative adversarial net (GAN) and median-of-mean (MoM) estimation. We use a deep neural network (DNN) with a ReLU activation function to model the generator and discriminator of the GAN. Theoretically, we derive a non-asymptotic error bound for the DNN-based MoM-GAN estimator measured by integral probability metrics with the $b$-smoothness H\"{o}lder class. The error bound decreases essentially as $n^{-b/p}\vee n^{-1/2}$, where $n$ and $p$ are the sample size and the dimension of input data. We give an algorithm for the MoM-GAN method and implement it through two real applications. The numerical results show that the MoM-GAN outperforms other competitive methods when dealing with contaminated data.
translated by 谷歌翻译
Currently, most deep learning methods cannot solve the problem of scarcity of industrial product defect samples and significant differences in characteristics. This paper proposes an unsupervised defect detection algorithm based on a reconstruction network, which is realized using only a large number of easily obtained defect-free sample data. The network includes two parts: image reconstruction and surface defect area detection. The reconstruction network is designed through a fully convolutional autoencoder with a lightweight structure. Only a small number of normal samples are used for training so that the reconstruction network can be A defect-free reconstructed image is generated. A function combining structural loss and $\mathit{L}1$ loss is proposed as the loss function of the reconstruction network to solve the problem of poor detection of irregular texture surface defects. Further, the residual of the reconstructed image and the image to be tested is used as the possible region of the defect, and conventional image operations can realize the location of the fault. The unsupervised defect detection algorithm of the proposed reconstruction network is used on multiple defect image sample sets. Compared with other similar algorithms, the results show that the unsupervised defect detection algorithm of the reconstructed network has strong robustness and accuracy.
translated by 谷歌翻译
As machine learning being used increasingly in making high-stakes decisions, an arising challenge is to avoid unfair AI systems that lead to discriminatory decisions for protected population. A direct approach for obtaining a fair predictive model is to train the model through optimizing its prediction performance subject to fairness constraints, which achieves Pareto efficiency when trading off performance against fairness. Among various fairness metrics, the ones based on the area under the ROC curve (AUC) are emerging recently because they are threshold-agnostic and effective for unbalanced data. In this work, we formulate the training problem of a fairness-aware machine learning model as an AUC optimization problem subject to a class of AUC-based fairness constraints. This problem can be reformulated as a min-max optimization problem with min-max constraints, which we solve by stochastic first-order methods based on a new Bregman divergence designed for the special structure of the problem. We numerically demonstrate the effectiveness of our approach on real-world data under different fairness metrics.
translated by 谷歌翻译
Salient object detection (SOD) aims to determine the most visually attractive objects in an image. With the development of virtual reality technology, 360{\deg} omnidirectional image has been widely used, but the SOD task in 360{\deg} omnidirectional image is seldom studied due to its severe distortions and complex scenes. In this paper, we propose a Multi-Projection Fusion and Refinement Network (MPFR-Net) to detect the salient objects in 360{\deg} omnidirectional image. Different from the existing methods, the equirectangular projection image and four corresponding cube-unfolding images are embedded into the network simultaneously as inputs, where the cube-unfolding images not only provide supplementary information for equirectangular projection image, but also ensure the object integrity of the cube-map projection. In order to make full use of these two projection modes, a Dynamic Weighting Fusion (DWF) module is designed to adaptively integrate the features of different projections in a complementary and dynamic manner from the perspective of inter and intra features. Furthermore, in order to fully explore the way of interaction between encoder and decoder features, a Filtration and Refinement (FR) module is designed to suppress the redundant information between the feature itself and the feature. Experimental results on two omnidirectional datasets demonstrate that the proposed approach outperforms the state-of-the-art methods both qualitatively and quantitatively.
translated by 谷歌翻译